Change from TextFileReader to ParquetStreamReader#348
Change from TextFileReader to ParquetStreamReader#348JanWillruth wants to merge 49 commits intoglamod:mainfrom
Conversation
…ng of files larger than RAM when chunk_size is specified; Rework affected Databundle code
|
…ames; Remove unneeded TextFileReader tests form test_pandas.py
|
@JanWillruth: I made some high performance tests. This PR does not really affect the maximum memory usage, but speeds up the code a liitle bit. Nevertheless, we should use this PR your readability reasons. Do you want to move the |
|
I merged #360 into the |
# Conflicts: # cdm_reader_mapper/mdf_reader/utils/utilities.py # tests/test_reader_utilities.py
|
Hi @jtsiddons, what do you think about this PR? Some ideas for improvements or generel comments? |
Thanks @ludwiglierhammer - I've scheduled some time to have a look this afternoon |
|
Hi @jtsiddons, we could replace all TextFileReader elements with the new ParquetStreamReader. Do you have any further suggestions for this PR. We would appreciate your review. |
… and Iterable of pd.DataFrame
To do
TextFileReadertoParquetStreamReaderfor (better) handling of files larger than RAM when chunk_size is specifiedDatabundlecodeParquetStreamReaderfrommdf_reader.utilis.utilitiestocommon.iteratorscdm_mapper.mappercodecommon.selectcodecommon.inspectcodemetmetpy.validatecodemetmetpy.correctcodemdf_reader.utilities.utilscodemdf_reader.writercodecore._utilities._copycodeParquetStreamReaderoption tocommon.replacecodecommon.pandas_TextParser_hdlrmypyhook #368mypyhook #368Ideas
write a decorator so that we can call a function in the "normal" way, but the decorator decides whether to execute the function or pass it to
common.iterators.process_disk_backed_apply_or_chunkbut as a decoratormy_func(DataFrame, *args, **kwargs)andmy_func(ParquetStreamReader, *args, **kwargs)Issues
This PR addresses opened issues: